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Abstract 


The R Package CEC Kamieniecki and Spurek (2014) performs clustering based on 
the cross-entropy clustering (CEC) method, which was recently developed with the use 
of information theory. The main advantage of CEC is that it combines the speed and 
simplicity of /c-means with the ability to use various Gaussian mixture models and reduce 
unnecessary clusters. In this work we present a practical tutorial to CEC based on the R 
Package CEC. Eunctions are provided to encompass the whole process of clustering. 

Keywords', clustering, Gaussian models, density estimation. 


1 Introduction 

Clustering plays a basic role in many parts of data engineering, machine learning, pattern 
recognition and image analysis, see Hartigan (1975); Jain and Dubes (1988); Jain, Murty, and 
Flynn (1999); Jain (2010); Xu and Wunsch (2009). Thus, it is not surprising that numer¬ 
ous clustering methods were implemented as an R Package e.g. mclust (Fraley and Raftery 
(1999)), pdfCluster (Azzalini and Menardi (2014)), mixtools (Benaglia, Chauveau, Hunter, 
and Young (2009)), clues (Chang, Qiu, Zamar, Lazarus, and Wang (2010)), HDclassif (Berge, 
Bouveyron, and Girard (2012)), ClustOfVar (Chavent, Kuentz-Simonet, Liquet, and Saracco 
(2012)), etc. 

Several of the most popular clustering methods are based on the /c-means approach, see Bock 
(2007, 2008). Although /c-means is easily scalable, it has the tendency to divide the data 
into spherically shaped clusters of similar sizes. Consequently, it is not affine invariant and 
does not deal well with clusters of various sizes. This causes the so-called mouse-effect, see 
Fig. 1(a). Moreover, it does not change dynamically number of clusters, see Fig. 1(b), and 
therefore in order to efficiently apply /c-means, usually data preprocessing (like whitening) 
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needs to be applied and additional tools like gap statistics developed by Tibshirani, Walther, 
and Hastie (2001); Mirkin (2011) to choose the right number of groups have to be used. 

Another group of clustering methods is based on density estimation techniques which use 
Expectation Maximization (EM) method (McLachlan and Krishnan (1997); Same, Ambroise, 
and Govaert (2007)). Probably the Gaussian Mixture Model (GMM) is the most popular, 
see McLachlan and Krishnan (2007); McLachlan and Peel (2004). It is hard to overesti¬ 
mate the role of GMM and its generalizations in computer science (McLachlan and Krishnan 
(2007); McLachlan and Peel (2004); Jain and Dubes (1988)) in particular in object detection 
(Huang (1998); Figueiredo and Jain (2002), object tracking (Xiong, Chen, Wang, and Huang 
(2002)), learning and modeling (Samuelsson (2004)), feature selection (Valente and Wellekens 
(2004)), classification (Povinelli, Johnson, Lindgren, and Ye (2004)) or statistic background 
subtraction. 

The relation between the above two methods is well described by Estivill-Castro and Yang 
(2000): ”[...] The weaknesses of /c-means results in poor quality clustering, and thus, more 
statistically sophisticated alternatives have been proposed. [...] While these alternatives offer 
more statistical accuracy, robustness and less bias, they trade this for substantially more 
computational requirements and more detailed prior knowledge, see Massa, Paolucci, and 
Puliafito (1999).” 

The Cross-Entropy Clustering (CEC) approach proposed by Tabor and Spurek (2014) joins 
the clustering advantages of /c-means and EM. It occurs that CEC inherits the speed and 
scalability of /c-means, while overcoming the ability of EM to use mixture models. In partic¬ 
ular, contrary to GMM, new models can easily be added without the need for complicated 
optimization (Section 4). Consequently, this allows the use of CEC as an elliptic pattern 
recognition tool (Tabor and Misztal (2013); Spurek, Tabor, and Zajac (2013)). The motiva¬ 
tion of CEC comes from the observation that in the case of coding it is often profitable to use 
various compression algorithms specialized in different data types. The idea was based on the 
classical Shannon Entropy Theory, see Cover, Thomas, Wiley et al (1991); MacKay (2003); 
Shannon (2001) and the Minimum Description Length Principle (Griinwald (2007); Griin- 
wald, Myung, and Pitt (2005)). Similar approach, which uses MDLP for image segmentation, 
was given by Ma, Derksen, Hong, and Wright (2007); Yang, Wright, Ma, and Sastry (2008). 
A close approach from the Bayesian perspective can also be found in the works of Kulis and 
Jordan (2012); Kurihara and Welling (2009); Korzeh, Jaroszewicz, and Klesk (2013). 

CEC allows an automatic reduction of “unnecessary” clusters, since, contrary to the case of 
classical fc-means and EM, there is a cost of using each cluster. To visualize this theory let the 
results of Gaussian CEC be considered, given in Figure 1(e), where the process started with 
A; = 10 initial randomly chosen clusters which were reduced automatically by the algorithm 
(used with Spherical CEC). The step-by-step view of this process can be seen in Figure 2, 
in which the subsequent steps of the Spherical CEC on data distributed uniformly inside the 
circle, and divided initially at two almost equal parts are illustrated. 

There are several probabilistic methods which try to estimate the correct number of clus¬ 
ters. For example, Goldberger and Roweis (2004) use the generalized distance between Gaus¬ 
sian Mixture Models with different components number by using the Kullback-Leibler diver¬ 
gence, see Cover et al (1991); Kullback (1997). A similar approach was presented by Zhang, 
Zhang, and Yi (2004) (Competitive Expectation Maximization) which uses the Minimum 
Message Length criterion provided by Figueiredo and Jain (2002). In practice, MDLP can 
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M[.1] 

(a) /c-means with /c = 3 initial 
number of clusters. 

Classification 


M[.1] 

(b) /c-means with /c = 10 initial 
number of clusters. 

Classification 






(c) Spherical Mclust with /c = 3 (d) Spherical Mclust with /c = 10 

initial number of clusters. initial number of clusters. 


(e) Spherical CEC with /c = 10 
initial number of clusters, which 
was reduced to /c = 3. 


(f) General Gaussian GEG with 
/c = 10 initial number of clusters, 
which was reduced to /c = 6. 




Figure 1: Clustering of the uniform density on a mouse-like set by various types of algorithms. 


also be directly used in clustering, see Wallace and Kanade (1990). However, most of the 
above mentioned methods typically proceed through all the consecutive clusters and do not 
reduce the number of clusters on-line during the clustering process. 

For the convenience of the reader, the contents of the article are hereby briefly summarized. 
In the next section a short introduction to the CEC algorithm is provided. Formulas for 
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Figure 2: The step-by-step view of clusters reduction in the case of a disc-like set. 


the cross entropy (which corresponds to the maximal likelihood estimation) of the studied 
data with respect to the given Gaussian model are also given. The third section concerns 
parameter fitting and the properties of different Gaussian models. A model with different 
type of clusters models (“mixed CEC”) is presented as well. In the last section, the structure 
of the R Package CEC is presented and assorted examples are provided. 


2 Theoretical background of CEC 

Let it be recalled that in general EM aims to find 

k 

Pi,---,Pk>0:'y^Pi = l, ( 1 ) 

i=l 

and fi^^ fk G where is a fixed (usually Gaussian) family of densities such that the 
convex combination 

f ■=Pifi + ■■■Pkfk (2) 

optimally approximates the scattering of the data under consideration X = {xi,..., The 
optimization is taken with respect to an MLE based^ cost function 

EM(/,X) := + ... + pkfkixj)), (3) 

where |X| denotes the cardinality of a set X. 

The optimization in EM consists of the Expectation and Maximization steps. While the 
Expectation step is relatively simple, the Maximization usually (except for the simplest case 
when the family F denotes all Gaussian densities) needs a complicated numerical optimization. 

The goal of CEC is similar, i.e. aims at minimizing the cost function (which is a small 
modification of that given in (3) by substituting the sum with a maximum): 

1 "" 

CEC{f,X) := -—y^ln {max{pifi{xj),... ,pkfk{xj))), (4) 

j = l 

^ Since in clustering the aim is typically to minimize the cost function, the function in (3) is the MLE with 
a changed sign. 
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where all pi ioi i = satisfy the condition (1). It occurs, see Tabor and Spurek (2014), 

that the above formula implies that, contrary to EM, it is profitable to reduce some clusters (as 
each cluster has its cost). Consequently, after minimization for some parameters i G {1,..., fc} 
the probabilities pi will typically equal zero, which implies that the clusters they potentially 
represent have disappeared. Thus /c, contrary to the case of EM, does not denote the final 
number of clusters obtained, but is only an upper bound of the number of clusters of interest 
(from a series of experiments the authors discovered that typically the good initial guess is to 
set k = 10). Instead of focusing on the density estimation as its first aim, CEC concerns the 
clustering, where, similarly to EM, the point x is assigned to the cluster i which maximizes 
the value Pifi{x). 

However, given the solution to CEC, a good estimation of the initial density is obtained by 
applying the formula (2), which in practical cases is very close (with respect to the MLE cost 
function given by (3)) to the one constructed by EM. 

Let it be remarked that the seemingly small difference in the cost function between (3) and 
(4) has profound further consequences, which follow from the fact that the densities in (4) do 
not “cooperate” to build the final approximation of /. 

The general idea of cross-entropy clustering relies on finding the splitting of X C into 
pairwise disjoint sets Xi,..., Xk such that the overall inner information cost of clusters, given 
in (4), is minimal. Consequently, to explain CEC, the cost function to minimize needs to be 
introduced. To do so, let it be recalled that by the cross-entropy of data set X with respect 
to density f is given by 

I I xGX 

Thus, using the information theory approach based on differential entropy (Cover et al (1991)) 
instead of the statistical (MLE) point of view, the value of — ln/(x) in the above sum may 
be interpreted as the length of code of x with respect to the coding /. In the case of splitting 
of X C into Xi,... ,X/^ such that elements of Xi are “coded” by density /^, it can be 
proven (following Tabor and Spurek (2014)) that the mean code-length of a randomly chosen 
element x G X equals 


k 

CEC(Xi, /i;...; fk) := • (- ln(p,) + (X,||/,)) , where pi = (5) 

i=l 

Roughly speaking, the first component — In(p^) in the brackets on the RHS is the number of 
Nats necessary to identify which algorithm is used for coding the element x G X^ and the 
second one, iL^(x||/^), is the mean code-length of coding Xi by the density /^. Thus, the use 
of each cluster is in a natural way penalized by the function — In(p^) (the cost of identifying 
the cluster), which consequently causes the reduction of those clusters which do not add to 
the total quality of clustering. 

To efficiently use mixture models in CEC, only the optimal value of the coding of X needs to 
be computed: 

H^{X\\X) := inf H^{X\\f) 


with respect to the density family T. 
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Optimization condition. 

Summarizing, given the density families Ei,..., En, the goal of the CEC algorithm is to divide 
the data-set X into k (possibly empty) clusters Xi,..., such that the value of the function 


CEC{Xi,J^r,..-;Xk,J^k) :=J2Pi-i-HPi) + H^iXim)): where 

is minimal. 


Pi = 


i=l 


1^*1 

1^1 


( 6 ) 


In practice the exact formula for the value of cross-entropy H^(X^X) for the most common 
subfamilies T of all Gaussian densities can easily be derived, see Tabor and Spurek (2014); 
Tabor and Misztal (2013) (in the following pages, the formula for the six most commonly 
encountered Gaussian subfamilies are given). Since each Gaussian is uniquely identified by 
its mean and covariance, the denotation for the estimators of mean and covariance of the 
random variable is needed, the realization of which is given by the data set X. As usual, by 
an estimator of the mean and covariance we take 


mx := 


1 

w 


E-. 


:= tT - mx)(x - mx^. 

I I xGX 

The ground is now set to present the exact formula for the cross-entropy of Gaussian subfam¬ 
ilies implemented in the CEC Package: 


1- - Gaussian densities with covariance E. The clustering will have the tendency to 

divide the data into clusters resembling balls with respect to the Mahalanobis distance || • ||e. 



= f ln( 27 r) + itr(E-iEx) + pndet(E) 


2. Qri ~ subfamily of for E = rl and r > 0 is fixed, which consists of the spherical 
(radial Gaussian) with covariance matrix rl (the clustering will have tendency to divide the 
data into balls with fixed radius proportional to ^/r). 



3. G{.i) - spherical (radial) Gaussian densities meaning those Gaussians for which the 

covariance is proportional to identity. The clustering will try to divide the data into balls of 
arbitrary sizes. 



^d,yx) = 

= f ln( 27 re/iV) + f In(trEx) 
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4. Qdmg ~ Gaussians with diagonal covariance. The clustering will try to divide the data 
into ellipsoids with radii parallel to coordinate axes. 



= diag(Ex) 

^V^ll^diag) = f ln(27re) + pn(det(diag(Sx))) 


5. ~ Gaussian densities with the covariance matrix having eigenvalues Ai,..., Aw 

such that Ai < ... < Aw- The clustering will try to divide the data into ellipsoids with fixed 
shape rotated by an arbitrary angle. 



6. Q - all Gaussian densities. In this case the dataset is divided into ellipsoid-like clusters 
without any preferences concerning the size or the shape of the ellipsoid. 



The algorithm behind CEC 

In this subsection the basic information about the implementation of CEC in R is presented. 
As usual in clustering the process is started with the initialization of the clusters, which can 
either be done by choosing centers from the dataset randomly and assigning points to the 
nearest center, or by the A:-means++ approach proposed by Arthur and Vassilvitskii (2007). 

Since CEC, from both the implementation and the theoretical point of view, is a generalization 
of fc-means, in the search for the minimum of the cost function the typical approaches used 
in /c-means can be used - the Lloyd’s and the Hartigan’s methods. From the practical point 
of view, Hartigan’s approach finds smaller minima and reduces the unnecessary clusters in a 
better way, but at the cost of recomputing the covariance at each passing through every data 
point. Consequently, for the low-dimensional data the authors suggest using the Hartigan’s, 
while for high dimensional the Lloyd’s method. 

The idea of the Hartigan method is to proceed over all elements of X, and switch the member¬ 
ship to those clusters which would maximally decrease the cost function. Since in the discussed 
approach the clusters are removed, the classical Hartigans approach is slightly modified. 


Hartigan’s procedure 
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Consider k sub density families^ To explain the Hartigan approach more precisely 

the notion of cluster membership function is needed 

cl: X ^ {0,..., fc}, 

where element x G X belongs to the cl(x)-th cluster: Xci(^) (0 is reserved as a special symbol 
which denotes the fact that x is unassigned). 

Such a cluster membership function cl: X ^ {1,..., A:} is desired (thus all elements of X are 
assigned) that the value of 

CEC(Xi, J-i;... where Xi := {x e X : cl(x) = 1} = 

is minimal. 

The basic idea of Hartigan is relatively simple - the process goes over all elements of X and 
the following steps are applied: 

• if the chosen x G X is unassigned, assign it to the arbitrary nonempty cluster; 

• reassign x to those clusters for which the decrease in cross-entropy is maximal; 

• check if no cluster needs to be removed^, if this is the case remove its all elements; 

until no cluster membership has been changed during the whole iteration over set X. 

Observe that when dealing with Gaussian families discussed in the previous section to compute 
(Xi\\Ei)^ the cardinality of X^ and its covariance need to be known. This implies that in 
practice the whole cluster X^ does not need to be remembered - it is sufficient to know its 
covariance and cardinality. 

It occurs that in practice, after adding or deleting point x to the cluster, the covariance and 
cardinality of Xi can be updated on-line. Therefore, only the value of the mean and the 
covariance matrix of Xi needs to be remembered. 

This is discussed in the following observation (Yi plays the role of cluster X^, and Y 2 denotes by 
default the point x which we either add or remove from cluster Xi). Consider sets Yi, T 2 C M^: 

a) The case^ when Yi H ^2 = 0 is first discussed. Then 

Wiuy2 = Pi^Yi + P2^Y2: 

^YiUY 2 = Pl'^Yi +P2^Y2 +PlP2i^Yi - my2)(mYj - 

where pi = and p 2 = 

b) Assume^ that Yi C Y 2 . Then 

^Yi\Y 2 = Ql^Yi - q2^Y2 - gig2(myi - my2)(myj - my2)^, 
where qi := and q 2 := 

From the above equations formula in the case of adding one point and removing some of them 
can easily be obtained. 

^Only the density family for which formulas are presented at the end of the previous section is used. 

^The given cluster is usually removed if it falls below some percentage level of all data, which was usually 
fixed at 5%. 

^This corresponds to the case when the point x is added to the cluster Xi . 

^This corresponds to the case when the point x is removed from the cluster Xi. 
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3 The CEC package 

In this section the implementation of the CEC algorithm in the R Package CEC is presented. 
Consider first the Old Faithful data in M, see Azzalini and Bowman (1990). Observe that the 
data distribution resembles a mixture of two Gaussians, see Figure 3. 

In the basic use of this package the input dataset data and the initial number centers of 
clusters: cec(x = . . . , centers = . . .) have to be specified. Below, a simple session with 
R is presented, where the component (waiting) of the Old Faithful dataset is split into two 
clusters. 

R> library(”CECV 
R> attach(faithful) 

R> cec <- cec(matrix(faithful$waiting), 2) 

R> print(cec) 

CEC clustering result: 


Clustering vector: 


[1] 

1 

2 

1 

2 

1 

2 

1 

1 

2 

1 

2 

1 

1 

2 

1 

2 

2 

1 

2 

1 

2 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

2 

1 

[35] 

1 

2 

2 

1 

2 

1 

1 

2 

1 

2 

1 

1 

2 

2 

1 

2 

1 

1 

2 

1 

2 

1 

1 

2 

1 

1 

2 

1 

2 

1 

2 

1 

1 

1 

[69] 

2 

1 

1 

2 

1 

1 

2 

1 

2 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

2 

1 

2 

1 

2 

1 

1 

1 

2 

1 

2 

1 

[103] 

2 

1 

1 

2 

1 

2 

1 

1 

1 

2 

1 

1 

2 

1 

2 

1 

2 

1 

2 

1 

1 

2 

1 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

[137] 

2 

1 

2 

1 

1 

2 

1 

1 

1 

2 

1 

2 

1 

2 

1 

1 

2 

1 

1 

1 

1 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

[171] 

2 

2 

1 

1 

1 

1 

1 

2 

1 

1 

2 

1 

1 

1 

2 

1 

1 

2 

1 

2 

1 

2 

1 

1 

1 

1 

1 

1 

2 

1 

2 

1 

1 

2 

[205] 

1 

2 

1 

1 

2 

1 

1 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

2 

1 

2 

1 

2 

2 

1 

[239] 

1 

2 

1 

2 

1 

2 

1 

1 

2 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 

2 

1 

2 

2 

1 

1 

2 

1 

2 

1 


Probability vector: 

[1] 0.6360294 0.3639706 

Means of clusters: 

[,i] 

[1,] 80.20809 
[2,] 54.62626 

Cost function at each iteration: 

[1] 3.820302 3.817422 3.817422 

Number of clusters at each iteration: 
[ 1 ] 2 2 2 

Number of iterations: 

[ 1 ] 2 

Computation time: 

[ 1 ] 0 
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Histogram of Time between Old Faithful eruptions 



Minutes 

Figure 3: The Old Faithful waiting data fitted with a CEC model. 


Available components: 


[1] 

"data" 

"cluster" 

"probabilities" 

[4] 

"centers" 

"cost.function" 

"nclusters" 

[7] 

"final.cost.function" 

"final.nclusters" 

"iterations" 


As its main outcome CEC returns data cluster membership cec$cluster, which corresponds 
to the function cl: X {1,... ,/c} from the previous section. The following parameters of 
clusters Xi,..., are obtained as well: 

• Pi = (probabilities of clusters); 

• m^: means of clusters; 

• Eg covariances of clusters. 

The above are necessary to obtain the calculated subdensity 

max (pi • , 

which can be used to compute the cost function (given by the cross-entropy) or the identifi¬ 
cation of the cluster membership of new points (a point x belongs to this cluster where the 
value 5 ].)(x) is maximized). Moreover, the above can be used to compute the density 
estimation 

The parameters of the CEC model are stored as: 

1. a list of means (cec$centers, namely for i = 1, ..., fc), 

2. a list of covariances (cec$covarlances .model, namely for i = 1,..., A:), 
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3. a list of probabilities (cec$probability, namely for i = 1,..., A:). 

Some additional information concerning the number of iterations, cost (energy) function and 
the number of clusters during the following iterations is also obtained. 

Below, a session of R is presented which shows how to use the above parameters for plotting 
the data and the Gaussian models corresponding to the clusters. 

R> hist(faithful$waiting, prob = TRUE, main = "Histogram of Time between Old 
+ Faithful eruptions", xlab = "Minutes", ylim = c(0, 0.05)); 

R> ford in c(l:2)){ 

R> curve(cec$probability[i] * dnorm(x, mean = cec$centers [i], 

+ sd = sqrt(cec$covariances[[i]][l])), add = T, col = i + 1) 

R> } 

As it was said, the discussed method, analogously to /c-means, depends on the initial clus¬ 
ters memberships. Therefore, the initialization should be started a few times, which can be 
obtained with the use of parameter nstart (e.g., cec <- cec(x = . . . , centers = . . . , 
nstart = ...)). The initial cluster membership function can be chosen by the use of 
centers, init either randomly, "random", or with the method given by the /c-means++ 
algorithm (Arthur and Vassilvitskii (2007)), "kmeans++". 

Two more parameters are important in the initialization. The first iter.max = 100 equals 
the maximum number of iterations in one CEC start and the second card.min = "5%" is the 
percentage of the minimal size of each cluster. The second parameter specifies the minimal 
possible number points in each cluster (clusters which contains less points are removed). Since 
each cluster is described by a covariance matrix, the number of elements in the cluster must 
be larger than the dimension of the data. 

One of the most important properties of the CEC algorithm is that it can be applied for 
various Gaussian models. Therefore, the CEC package includes the implementation of six 
Gaussian models, which can be specified by the parameter type. All the models implemented 
in the CEC package are discussed below. 

Q — General Gaussian distributions 

The family containing all Gaussian distributions G is considered first. The results of the gen¬ 
eral Gaussian CEC algorithm give similar results to those obtained by the Gaussian Mixture 
Models. However, the authors’ method does not use the EM (Expectation Maximization) 
approach for minimization but a simple iteration process (Hartigan method). Consequently, 
larger datasets can be processed in shorter time. 

The clustering will have the tendency to divide the data into clusters in the shape of ellipses 
(ellipsoids in higher dimensions). 

R> library("CEC") 

R> data("fourGaussians") 

R> cec <- cecCfourGaussians, centers = 10, type = "all", nstart = 20) 

R> plot(cec, xlim = c(0, 1), ylim = c(0, 1), asp = 1) 

R> cec.plot.cost.function(cec) 
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Cost function at each iteraion 





(a) Randomly generated four Cans- (b) Effect of CEC with an initial (c) Decrease of the cost function, 
sians dataset. number of /c = 10 clusters. 

Figure 4: Clustering with respect to the general Gaussian model. Figure (c) presents the 
decrease of the cost function in time. 


It can be used for exploring the data structure in the case when no information about the 
relations in the dataset is available. After the analysis of the outcome, the decision can be 
made to use more specific types of Gaussian families. 

The result of CEC algorithms with various types of Gaussian models on T type sets are 
presented in Fig. 5. The Figure was generated by the following codes in R: 

R> library(”CEC”) 

R> dataC’Tset”) 


• spherical CEC: 


R> cec <- cec(x = Tset, centers = 10, type = "spherical") 

R> plot (cec, xlim = c(0, 1), ylim = c(0, 1), asp = 1) 

• spherical CEC with fixed radius: 

R> cec <- cec(x = Tset, centers = 10, type = "fixedr", param = 0.01) 
R> plot(cec, xlim = c(0, 1), ylim = c(0, 1), asp = 1) 

• diagonal CEC: 

R> cec <- cec(x = Tset, centers = 10, type = "diagonal") 

R> plot (cec, xlim = c(0, 1), ylim = c(0, 1), asp = 1) 

• fixed covariance CEC: 


R> cec <- cec(x = Tset, centers = 
+ param = matrix(c(0.04, 0, 0, 
R> plot(cec, xlim = c(0, 1), ylim 


10, type = "covariance", 
0.01), 2)) 

= c(0, 1), asp = 1) 


• fixed eigenvalue CEC: 
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(a) The T-dataset. 


(b) Effect of the spherical CEC. 



(c) Effect of the spherical CEC with (d) Effect of the diagonal CEC. 
fixed radius. 



(e) Effect of the fixed covariance CEC. (f) Effect of the fixed eigenvalue CEC. 


Figure 5: The CEC algorithm in the case of clustering a T-type set according to the various 
types of the CEC model. 


R> cec <- cec(x = Tset, centers = 10, type = "eigenvalues”, 
+ param=c(0.01, 0.001)) 
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R> plotCcec, xlim = c(0, 1), ylim = c(0, 1), asp = 1) 


G{-i) ~ Spherical Gaussians 

The second family discussed contains spherical Gaussian distributions G(.i) which can be ac¬ 
cessed by cec (x = centers = type = "spherical"). The original distribution 

will be estimated by spherical (radial) densities, which will result with splitting the data into 
circle-like clusters of arbitrary sizes (balls in higher dimensions). 

In Fig. 5(b) the result of the spherical algorithm with circles fitted to the obtained clusters is 
presented. This family can be used for the recognition of circular-shape objects, see Spurek 
et al (2013). 

Gri — Spherical Gaussians with a fixed radius 

The next model implemented in the CEC package is a spherical model with a fixed covari¬ 
ance: cec(x = centers = type = "fixedr", param = ...). Similarly to the 

general spherical model, the dataset will be divided into clusters resembling full circles, but 
with the radius determined by param. 

In Fig. 5(c) the result of the spherical fixed radius algorithm with ellipses fitted to the 
obtained clusters is presented. 

^diag “ Diagonal Gaussian 

The fourth model is based on diagonal Gaussian densities (e.g cec(x = . . . , centers = 
. . . , type = "diagonal")). In this case, the data will be described by ellipses for which 
the main semi-major axes are parallel to the axes of the coordinate system. In Fig. 5(d) the 
result of the spherical fixed radius algorithm with ellipses fitted to the obtained clusters is 
presented. 

Gy: — Gaussian with fixed covariance 


The next model contains Gaussians with an arbitrary fixed covariance matrix e.g 

cec(x = centers = type = "covariances", param = ...). In this example 

^0^ 0 01 which means that the data is covered by fixed ellipses. In Fig. 5(e) the 

result of the fixed covariance CEC is presented. 


~ Gaussian densities with fixed eigenvalues Ai,..., Aa^ 


The last model is based on Gaussians with arbitrary fixed eigenvalues (e.g cec(x = . . . , 
centers = . . . , type = "eigenvalues", param = ...)). In this example Ai = 0.01, A 2 = 
0.001 are used, which means that the set is covered by ellipses with fixed semi axes (which 
correspond to the eigenvalues). In Fig. 6(b) the result of the fixed eigenvalues CEC is 
presented. 

At the end of this section we present how our method works on data from UCI repository. 
In the first example we consider iris dataset, which consists of 50 samples from each of 
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three classes of iris flowers Fisher (1936). One class is linearly separable from the other two, 
while the latter are not linearly separable from each other, see Fig. 6. Next we consider 
three coordinates of wine data set, analogically to experiments from introduction to the R 
package pdfCluster Azzalini and Menardi (2014). The wine data set was introduced by 
Forina, Armanino, Castino, and Ubigli (1986). It originally included the results of 27 chemical 
measurements on 178 wines grown in the same region in Italy but derived from three different 
cultivars: Barolo, Grignolino and Barbera, see Fig. 6. 
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(a) Iris dataset. (b) Wine dataset. 

Figure 6: Clustering of iris and wine datasets according to the general CEC model. 


4 A mix of the Gaussian models 

One of the most powerful properties of the CEC algorithm is the possibility of mixing models. 
More precisely, the mixed models can be specified by giving a list of cluster types 

cec(x = . . . , centers = . . . , type = cC'all", . . .), param = . . .). 

Fig. 7 presents the CEC clustering according to two clusters described by spherical Gaussians 
with a fixed radius {Gri) ^ = 350 and five clusters of type with fixed eigenvalues 

c(9000, 8). This kind of configurations can be used in many cases, especially if a wide 
knowledge of the structure of the investigated set is possessed. Various patterns of the image 
can be distinguished, for example multiple types of objects can be detected simultaneously, 
e.g., the search for matches (Gaussian with specified covariance matrix) and coins (spherical 
Gaussian with fixed radius) is possible at the same time - compare with Tabor and Misztal 
(2013). 

Figure 7 was generated by the following code in R: 

R> library(”CECV 
R> data("mixShapes") 

R> cec <- cecCmixShapes, 7, type = cC'fixedr", "flxedr", "eigen", "eigen", 

+ "eigen", "eigen", "eigen"), param = list(350, 350, c(9000, 8), 
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+ c(9000, 8), c(9000, 8), c(9000, 8), c(9000, 8)), nstart = 100) 

R> plotCcec, asp = 1) 




(a) Dataset containing two types of patterns (b) Result of CEC with two Gaussians with 
(circular and elliptical). a fixed radius and four with fixed eigenvalues 

of covariance. 

Figure 7: The CEC algorithm in the case of clustering according to a mixed model. 


5 Concluding remarks 

The R CEC package proposed in this work uses cross-entropy clustering described by Tabor 
and Spurek (2014). The presented method is an interesting alternative to the classical cluster¬ 
ing methods like fc-means, EM, GMM and their generalizations. Since CEC does not use the 
EM method, new models can be added without the need for using complicated optimization. 
Another important property of CEC is the automatic reduction of the clusters which have a 
negative information cost. 

The main advantage of the method lies in the fact that it can be easily adapted to different 
Gaussian models. Thus, the package enables to specify which kind of Gaussian subfamilies 
will be used in clustering. In particular, it is possible to use: spherical Gaussians, spherical 
Gaussians with the fixed radius, diagonal Gaussians, Gaussians with the fixed covariance or 
Gaussians with fixed eigenvalues. Moreover, it is possible to use a combination of the above 
mentioned types of Gaussian subfamilies. The package also proposes the tools to visualize 
the obtained results. 
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